Maximal Data Piling in Discrimination

نویسندگان

Jeongyoun Ahn

J. S. Marron

چکیده

In a binary discrimination problem, a linear classifier finds a linear hyperplane that separates two classes by partitioning the data space. Especially in a High Dimension Low Sample Size (HDLSS) setting, there are linear separating hyperplanes such that the projections of the training data points onto their normal direction vectors are identically zero, or some non-zero constant. Of interest in this paper is a linear separating hyperplane such that the projections of the training data points from each class onto its normal direction vector have two distinct values, one for each class. This direction vector is uniquely defined in the subspace generated by the data. A simple formula is given to find this direction. In non-HDLSS settings, this direction vector is the same as the Fisher Linear Discrimination direction vector.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distance Weighted Discrimination

High Dimension Low Sample Size statistical analysis is becoming increasingly important in a wide range of applied contexts. In such situations, it is seen that the appealing discrimination method called the Support Vector Machine can be improved. The revealing concept is data piling at the margin. This leads naturally to the development of Distance Weighted Discrimination, which also is bas...

متن کامل

Distance Weighted Discrimination

High Dimension Low Sample Size statistical analysis is becoming increasingly important in a wide range of applied contexts. In such situations, it is seen that the popular Support Vector Machine suffers from “data piling” at the margin, which can diminish generalizability. This leads naturally to the development of Distance Weighted Discrimination, which is based on Second Order Cone Programmin...

متن کامل

Class-sensitive Principal Components Analysis

DI MIAO: CLASS-SENSITIVE PRINCIPAL COMPONENTS ANALYSIS (Under the direction of J. S. Marron and Jason P. Fine) Research in a number of fields requires the analysis of complex datasets. Principal Components Analysis (PCA) is a popular exploratory method. However it is driven entirely by variation in the dataset without using any predefined class label information. Linear classifiers make up a fa...

متن کامل

Sparse Distance Weighted Discrimination

Distance weighted discrimination (DWD) was originally proposed to handle the data piling issue in the support vector machine. In this paper, we consider the sparse penalized DWD for high-dimensional classification. The state-of-the-art algorithm for solving the standard DWD is based on second-order cone programming, however such an algorithm does not work well for the sparse penalized DWD with ...

متن کامل

Geometric Insights into Support Vector Machine Behavior using the KKT Conditions

The Support Vector Machine (SVM) is a powerful and widely used classification algorithm. Its performance is well known to be impacted by a tuning parameter which is frequently selected by cross-validation. This paper uses the Karush-Kuhn-Tucker conditions to provide rigorous mathematical proof for new insights into the behavior of SVM in the large and small tuning parameter regimes. These insig...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Maximal Data Piling in Discrimination

نویسندگان

چکیده

منابع مشابه

Distance Weighted Discrimination

Distance Weighted Discrimination

Class-sensitive Principal Components Analysis

Sparse Distance Weighted Discrimination

Geometric Insights into Support Vector Machine Behavior using the KKT Conditions

عنوان ژورنال:

اشتراک گذاری